Introduction

PALEVO

1631-0683

Elsevier

S1631-0683(13)00107-3

10.1016/j.crpv.2013.07.001

Research article

General palaeontology, systematics and evolution (Phylogenetic analysis)

Evaluating strategies of phylogenetic analyses by the coherence of their results

Évaluation des stratégies d'analyse phylogénétique par la cohérence de leurs résultats

Laurin

Michel

Blaise

blaise.li@normalesup.org

Centro de Ciências do Mar, Universidade do Algarve, Campus de Gambelas, 8005-139 Faro, Portugal Centro de Ciências do Mar, Universidade do Algarve

Campus de Gambelas

Faro

8005-139

Portugal

12 6

S1631-0683(13)X0007-7

Systematics beyond Phylogenetics / La systématique au-delà de la phylogénétique

381 387

2013

Académie des sciences

Full (PDF)

I propose an approach to identify, among several strategies of phylogenetic analysis, those producing the most accurate results. This approach is based on the hypothesis that the more a result is reproduced from independent data, the more it reflects the historical signal common to the analysed data. Under this hypothesis, the capacity of an analytical strategy to extract historical signal should correlate positively with the coherence of the obtained results. I apply this approach to a series of analyses on empirical data, basing the coherence measure on the Robinson–Foulds distances between the obtained trees. At first approximation, the analytical strategies most suitable for the data produce the most coherent results. However, risks of false positives and false negatives are identified, which are difficult to rule out.

Je propose une approche pour identifier, parmi plusieurs stratégies d’analyse phylogénétique, celles aux résultats les plus fiables. Cette approche se base sur l’hypothèse que, plus un résultat est reproduit à partir de données indépendantes, plus il reflète le signal historique commun aux données analysées. Sous cette hypothèse, la capacité d’une stratégie d’analyse à extraire le signal historique devrait être positivement corrélée à la cohérence des résultats obtenus. J’applique cette approche à une série d’analyses sur des données empiriques, en basant la mesure de cohérence sur les distances de Robinson–Foulds entre les arbres obtenus. En première approximation, les stratégies d’analyse les plus adaptées aux données produisent les résultats les plus cohérents. Cependant, des risques de faux positifs et de faux négatifs, difficiles à écarter, sont identifiés.

Chloroplasts, Coherence, Cyanobacteria, Methods, Phylogeny

Chloroplastes, Cohérence, Cyanobactéries, Méthodes, Phylogénie

1 Introduction

An important breakthrough for molecular phylogeny reconstruction has been made with the introduction of probabilistic approaches (Felsenstein, 1981 and Yang and Rannala, 1997), directly and explicitly using molecular evolution models. This usually reduces the occurrences of reconstruction artifacts, in particular in studies at large evolutionary scales (but see Simmons, 2012). In parallel with an increased availability of data (which permits a better estimation of the parameters of complex models) and computational power (which permits the exploration and evaluation of a large number of possible trees), the development of probabilistic methods was accompanied with the development of models that take into account an increasing number of aspects of molecular evolution such as evolutionary rate (Yang, 1993) or composition (Foster, 2004 and Lartillot and Philippe, 2004) heterogeneities. The accuracy of phylogenies can also be enhanced by using character selection or recoding techniques (Brinkmann and Philippe, 1999, Goremykin et al., 2010, Hassanin et al., 2005, Inagaki et al., 2004 and Roure and Philippe, 2011).

However, the diversity of methods and models available makes it difficult to decide which strategy to adopt when trying to reconstruct a phylogeny. Some methods are available to help the phylogeneticist in this choice. For instance, programs like jModelTest (Posada, 2008) use a variety of criteria to select a model achieving a good compromise between realism and tractability. But such readily available tools are limited to the set of models implemented in the phylogeny programs on which they rely. It is also common practice to compare phylogenies obtained using different models by applying selection criteria identical to those used in a posteriori model selection programs, which extends these selection approaches to arbitrary models. Still, the model is only one aspect of the analytical strategy: Data selection or recoding techniques also need to be chosen prior to the tree construction, a program and its specific settings have to be chosen, and support evaluation procedures can take diverse forms. All of these aspects form the analytical strategy that leads from the raw data to an annotated tree ready for drawing phylogenetic conclusions.

An approach suitable for the choice of such integrated analytical strategies could be to make the choice a posteriori, based on their results. A variety of analyses would be performed, and the ones producing the most accurate results would be chosen. This immediately raises the question as to how to evaluate the accuracy of a phylogeny reconstruction. Measures such as bootstrap proportions (Felsenstein, 1985) or Bayesian posterior probabilities are sometimes regarded as reliability indicators, but they must be interpreted in the limited context of the particular dataset that has been analysed. Other datasets may yield different support values (or even contradictory results) and these values do not correlate perfectly with one another (Douady et al., 2003). Reliability of phylogenetic relationships is arguably better estimated when considering trees obtained from several independent datasets, and examining the degree to which the results are reproduced across these datasets (Chen et al., 2003, Dettai and Lecointre, 2004, Li and Lecointre, 2009 and Miyamoto and Fitch, 1995). In this context, it has been observed that the reproducibility of the results was higher when a better modelling of the data was used (Miyamoto et al., 1994). This justifies a widespread practice consisting in using more complex models and methods when the phylogeny appears more challenging to resolve. This also suggests that result coherence could indeed correlate positively with accuracy.

The purpose of the present article is to report an attempt to use the a posteriori approach for selecting strategies of phylogenetic analyses using the reproducibility of the results as a criterion, and to discuss some potential pitfalls of such an approach.

2 Materials and methods 2.1 Test data

The a posteriori approach was tested on empirical multi-gene data assembled in the ambit of a yet-to-be-published work on the phylogeny of Cyanobacteria and plastids (Li et al., in preparation). Given the large evolutionary scale, as well as the potential existence of horizontal gene transfers, such a dataset should provide enough reconstruction challenge so that different analytical strategies will have different reconstruction accuracies, and show various degrees of result coherence.

The data consists of 73 protein-coding genes from 42 Cyanobacteria, plastids or nuclear genes of plastidial origin. The genes were grouped in 4 sets that were considered internally congruent and between them incongruent by the concaterpillar program (Leigh et al., 2008). This program performs a series of likelihood ratio tests under a GTR + I + Γ model, to evaluate whether datasets can be forced to share topologies and branch lengths or if separate trees provide a significantly better likelihood. Results of maximum likelihood analyses under a GTR + I + Γ model should therefore provide a reference situation where some incoherence effectively appears between the datasets. More accurate strategies than maximum likelihood analysis under a GTR + I + Γ model might be able to recover more of the history common to all datasets, for each one of them, and therefore be characterised by a higher coherence in the results.

2.2 Analytical strategies tested

For each of the 4 combined datasets, a series of various analytical strategies were applied. A name is associated with each of them to facilitate reporting and discussion of the results.

Maximum likelihood bootstrap analyses were conducted using RAxML versions 7.0.4 and 7.3.4 (Stamatakis, 2006) under a GTR + I + Γ model, with 200 pseudo-replicates of the data. For these analyses, the original data matrices were used, their amino-acid translations (for which a CPREV + I + Γ model was used) as well as some versions of these matrices where diverse combinations of sites were subjected to codon-degeneracy recodings.

A codon-degeneracy recoding is based on the replacement of codons by degenerate versions that represent all codons coding the same amino-acid. Nucleotides are replaced by IUPAC ambiguity codes at codon positions where several codons for the same amino-acid differ.

The goal of these recodings is to eliminate potentially misleading signal. The signal considered for removal corresponds to sites involved in codon synonymy. Due to the relaxed selection on the nucleotide at such sites, convergence between sequences sharing the same bias in their genome's nucleotide composition may have happened and mislead phylogenetic reconstruction (see for instance Cox et al., 2008, Foster, 2004, Hassanin et al., 2005, Nabholz et al., 2011 and Rota-Stabelli et al., 2013). The most useful of the recodings should affect mostly sites where the proportion of misleading signal is the highest, and eliminate only a small proportion of the historical signal. A higher coherence of the results is expected for such recodings.

For reasons pertaining to the organization of the genetic code (that will not be detailed here), three categories of codon positions were distinguished: third codon positions of any amino-acid, leucine and arginine first codon positions, serine first and second codon positions.

The maximum likelihood analytical strategies were the following (see also supplementary document available at http://dx.doi.org/10.6084/m9.figshare.732758 for the recodings): -

‘unrecoded’: the original matrix is used, without codon-degeneracy recoding. This strategy is used as a reference, where some degree of incoherence is expected between the datasets delimited by concaterpillar;

‘degen3’: the codons are replaced with degenerate versions representing all synonymous codons in their family, but only the third position is actually recoded;

‘degen1LR’: the codons are replaced with degenerate versions representing all synonymous codons in their family, but only at first positions of codons coding a leucine or an arginine;

‘degen12S’: the codons are replaced with degenerate versions representing all synonymous codons in their family, but only at first and second positions of codons coding for a serine;

‘degen12LRS’: the codons are replaced with degenerate versions representing all synonymous codons in their family, but only at first and second positions of codons coding for a leucine, an arginine or a serine;

‘degenLR3’: the codons are replaced with degenerate versions representing all synonymous codons in their family, but if the codon codes anything else than a leucine or an arginine, only the third position is actually recoded;

‘degenS3’: the codons are replaced with degenerate versions representing all synonymous codons in their family, but if the codon codes anything else than a serine, only the third position is actually recoded;

‘degenLRS3’: the codons are replaced with degenerate versions representing all synonymous codons in their family, but if the codon codes anything other than a leucine, an arginine or a serine, only the third position is actually recoded (since only leucine, arginine and serine are affected by codon synonymy at other positions than the third one, this actually amounts to replacing every (non-terminating) codon by the degenerate version representing its entire family);

‘translated’: the amino-acid translation of the matrix is used.

The maximum likelihood analysis of the original nucleotide protein-coding data was replicated using a partitioning of the model by which parameters of the GTR + I + Γ model were estimated independently for each codon position, branch lengths being optimized jointly on the three partitions (‘unrecoded_p’ analytical strategy). This also is expected to provide more accurate results.

More sophisticated molecular evolution models were used, in order to take into account some aspects of composition heterogeneity. The CAT model implemented in Phylobayes (Lartillot and Philippe, 2004) allows different sites to evolve under different nucleotide equilibrium frequencies. The NDCH model implemented in P4 (Foster, 2004) allows different branches of the tree to evolve under different nucleotide equilibrium frequencies.

The analyses using Phylobayes version 3.2f (named ‘CAT’) were performed as follows: Two Markov chain Monte Carlo (MCMC) were run under a GTR + I + Γ model, with the automatic stopping criterion based on the computation of convergence statistics between two chains (‘maxdiff’ > 0.3 and ‘effective size’ > 50, checking every 100 cycles).

The analyses using P4 versions 0.88.r190 and 0.88.r186 (named ‘NDCH’) were performed as follows: Four Metropolis coupled Markov chain Monte Carlo (MCMCMC) were run under a GTR + I + Γ model until the log likelihood values of the cold chain appeared to have plateaued and the ESS sampling values were higher than 50 (preferably at least 200, but that was not always achieved). A posterior predictive distribution (Bollback, 2002) of the X² composition heterogeneity statistic was generated using data simulated on the Markov chain samples (Foster, 2004). This distribution was compared to the X² of the empirical data to evaluate the ability of the model to account for composition heterogeneity across the tree. This comparison was implemented as a one-tailed area probability test (with a p-value deemed significant if smaller than 0.05). As long as the model did generate data significantly less heterogeneous than the empirical data, additional composition vectors were added and new analyses were performed under a GTR + I + Γ + nCV model (where n is the total number of composition vectors), using the same procedure as above. In practice, two composition vectors proved enough for every analysed dataset. The results of the ‘NDCH’ analyses therefore correspond to Bayesian analyses under a GTR + I + Γ + 2CV model.

In order to achieve better comparability with bootstrap analyses, for both the ‘CAT’ and ‘NDCH’ analytical strategies, the coherence of the results was measured using a random selection of 200 post-burnin samples from the Markov chain.

In addition, three other analytical strategies that were expected to have a worse performance than the ‘unrecoded’ maximum likelihood strategy were applied: a parsimony analysis (‘pars’) and two distance-based approaches, one using Jukes–Cantor distances (‘JCdist’) and the other LogDet distances (‘logdet’, Lockhart et al., 1994). All three strategies were performed using tools from the version 3.69 of the phylip package (Felsenstein, 2005) with 200 bootstrap pseudo-replicates of the data.

2.3 Coherence measure

The coherence of the results obtained by a given analytical strategy across a series of datasets could be evaluated in various ways, and the design of a method to accomplish such an evaluation could be a research topic in its own right. In the present work, the coherence was evaluated on the basis of pairwise topological distances between trees obtained by applying the analytical strategy on the datasets. The shorter the topological distances, the more ‘similar’ the trees, the more coherent the results.

The Robinson–Foulds symmetric difference (Robinson and Foulds, 1981) was used as topological distance. If we note A and B the sets of bipartitions defined by the branches of two trees on a same set of leaves, the Robinson–Foulds distance between these two trees is the number of bipartitions present in A or in B but not in both (see Fig. 1).

The coherence of a given analytical strategy can be evaluated using the Robinson–Foulds distances between pairs of trees obtained by applying that strategy on different datasets sharing the same set of leaves. If the sets of leaves differ between datasets, the trees have to be reduced to the set of common leaves. This might raise important issues that are beyond the scope of the present paper. Here, the four datasets delimited by concaterpillar included the same terminal taxa.

In the present work, the analytical strategies included the generation of 200 bootstrap trees or the extraction of 200 samples from a Markov chain. The coherence was therefore evaluated using the distances between pairs of consensus trees (4 consensuses of 200 trees because there are 4 datasets, which makes 6 pairs) and the full distribution of the distances between pairs of bootstrap or MCMC-sampled trees (200² = 40 000 distances for a pair of datasets, 6 pairs of datasets).

3 Results

The coherence measures are reported in Fig. 2.

A striking observation is that the analytical strategies with the highest coherence are those where the signal corresponding to synonymous substitutions at third codon positions has been eliminated (the names of these strategies end in ‘3’) and the strategy using translated data (by which an important part of the variability corresponding to third codon positions is eliminated due to codon synonymy). The next most coherent strategy is the ‘NDCH’ strategy, that models the existence of composition heterogeneities across the phylogeny.

All of these strategies appear more coherent than the reference ‘unrecoded’ strategy, where all signal is present in the data and a simple composition-homogeneous GTR + I + Γ model is used.

This difference in performance is likely to reflect a real difference in reconstruction accuracy. Indeed, both the suppression of the signal associated with synonymy at third codon position and the modelling of composition heterogeneities can reduce the risks of obtaining reconstruction artefacts driven by convergences in genome composition biases, which we know affect the data used here (Li et al., in preparation). The recoding strategies leaving third positions unaffected do not seem to lead to a higher coherence than the ‘unrecoded’ strategy.

Contrary to what was expected, the analytical strategy where the model is partitioned by codon position (‘unrecoded_p’) does not seem to perform strikingly better than its non-partitioned counterpart (‘unrecoded’). The average RF distance between bootstrap consensuses is only slightly lower for ‘unrecoded_p’ than for ‘unrecoded’.

As expected, the analytical strategies using parsimony (‘pars’) and distance (‘JCdist’ and ‘logdet’) seem to produce results overall less coherent than the ‘unrecoded’ strategy. It was however not expected that the strategy using a site-heterogeneous model (‘CAT’) would also produce less coherent results than the ‘unrecoded’ strategy. It should be noted that the statistical significance of this observation is not known. It may be that the ‘CAT’ strategy reveals true divergences between the evolutionary histories of the different dataset (see Section 4).

The ‘NDCH’ strategy used the program P4, which can produce trees with polytomies during the Markov chain. A tree with polytomies has less bipartitions than a fully bifurcating tree. The lower the numbers of bipartitions there are in a pair of trees, the lowest the upper bound on the RF distance between these trees. Indeed RF is the highest when the trees have no bipartition in common, and it is then equal to the sum of the numbers of bipartitions found in the trees. This may bias the coherence evaluation used in the present work. The same problem holds for the ‘pars’ analytical strategy, because parsimony sometimes results in a set of equally parsimonious trees, and the (polytomous) consensus of these trees is used as the result of the analysis.

This may explain why the ‘NDCH’ strategy had the lowest average RF distances between individual sampled trees, although the RF distances between consensus trees were not. And similarly, this may explain why the average RF distance between individual trees for the ‘pars’ strategy was not the highest, whereas it was the case for the distances between the bootstrap consensus trees for this strategy.

The distances between bootstrapped trees are in average higher than the distances between their consensuses. The bootstrapped trees for a given dataset are by nature dispersed because they are built using different matrices that are samples of the sites from the original alignment that possibly support diverging topologies, but their consensus is expected to reflect the signal present in the full alignment. If all datasets bear the same historical structure, and provided that this structure is efficiently revealed by the analytical strategy, a better coherence is expected between bootstrap consensuses than between the diverse bootstrapped trees.

This discrepancy in coherence between individual trees and between their consensuses is not observed in the case of MCMC-sampled trees (‘NDCH’ and ‘CAT’). The trees sampled from a stationary MCMC are all supposed to be drawn from the a posteriori probability distribution of trees given the full data matrix. If this distribution is centered around one highly dominant most probable topology, the topological dispersion observed within the sample will be low. This is often observed: be it a true characteristic of Bayesian posterior distributions in phylogeny, or because of prevalent inefficiencies in MCMC mixing (Lakner et al., 2008), consensus trees obtained from MCMC samples often have high node posterior probabilities (Douady et al., 2003).

4 Discussion

The most accurate strategies should extract more historical signal than the others. Consequently, independent datasets issued from a same evolutionary history should produce more similar trees when using these strategies than when using less accurate analytical strategies (Fig. 3a). This seems to be the case to some extent with our data, because the analytical strategies that are designed to counter the misleading effects of convergence in composition bias (‘degen*3’, ‘translated’, and ‘NDCH’) tend to produce more coherent topologies than the other strategies. If the accuracy improvement brought by these strategies is important enough, the hypothesis can be made that their use in concaterpillar (instead of GTR + I + Γ maximum likelihood analyses on unrecoded data) could yield a smaller number of combined datasets, due to a higher compatibility between the phylogenies supported by the individual protein-coding genes (and their combinations) under strategies more apt to overcome reconstruction artefacts.

The coherence-based a posteriori approach will probably not perform well when the different datasets were generated on different histories. A good analytical strategy should produce coherent results only within a set of datasets sharing the same history. Otherwise, result dispersion is expected. It is difficult to tell to what extent such dispersion will be strong relative to dispersion due to the use of inaccurate strategies. Such strategies might then be difficult to distinguish from more accurate ones (Fig. 3b). The surprisingly high relative incoherence obtained using the ‘CAT’ strategy may be due to the existence of conflicts between datasets that can only be revealed when heterogeneity of composition across sites is taken into account. Such conflicts are not unexpected, due to the bacterial nature of the taxa used for the present tests. The type of pitfall described in Fig. 3b may be less a problem with organisms not so prone to horizontal transfers of genetic material.

But even in cases where datasets all result from the same history, a potential pitfall exists that depends in the way the less accurate analytical strategies are inaccurate. If they are inaccurate in a systematically biased way, then the trees may be similar due to shared wrongly inferred relationships (Fig. 3c). Including ‘control’ analytical strategies chosen for their known sensitivity to some specific systematically biased artefacts could help the detection of false positives due to these biases. If a control analytical strategy happens to produce more coherent results than other strategies a priori more likely to be accurate, this is a hint that the reconstruction artefact can affect some analyses, and that the coherence criterion should be considered with caution.

The coherence measure used here is based on RF distances, which makes it possibly biased because analytical strategies producing less resolved trees might never look as incoherent as strategies producing fully bifurcating trees. However, it is not sure that this is an undesired property. Indeed, one might want to favour analytical strategies that are conservative and avoid displaying relationships not well enough supported.

5 Conclusion

The coherence-based a posteriori approach tested here seems to behave partially as expected. Some improvements can probably be made in the way coherence is measured, but the potential pitfalls to which it might be sensitive seem difficult to rule out. The presented approach may not be suitable as a good selection tool in practice. Different analytical strategies may be equally coherent but for different (bad or good) reasons. It may therefore be advisable to use such a method as an exploratory tool rather than as a decision tool.

Acknowledgements

Thanks to João Sollari Lopes and Cymon J. Cox for assembling and providing the datasets that were used in this study. Thanks to Michel Laurin and the SFS for giving me the opportunity to present these ideas at the ‘Journées d’automne 2012 de la SFS’. Thanks to the editor and reviewers for their comments and advice that hopefully helped me improving the present paper.

This work was supported by a Fundação para a Ciência e a Tecnologia (FCT, Portugal) grant to Cymon J. Cox, Centro de Ciências do Mar (CCMAR) – CIMAR-Lab. Assoc., (PTDC/BIA-BCM/099565/2008).

Appendix A Supplementary data

The following is the supplementary data to this article:

Bollback, 2002

Bollback

J.P.

Bayesian model adequacy and choice in phylogenetics

Mol. Biol. Evol. 19 2002

1171–1180

Brinkmann and Philippe, 1999

Brinkmann

Philippe

Archaea sister group of bacteria? Indications from tree reconstruction artifacts in ancient phylogenies

Mol. Biol. Evol. 16 1999

817–825

Chen et al., 2003

Chen

W.J.

Bonillo

Lecointre

Repeatability of clades as a criterion of reliability: a case study for molecular phylogeny of Acanthomorpha (Teleostei) with larger number of taxa

Mol. Phylogenet. Evol 26 2003

262–288

Cox et al., 2008

Cox

C.J.

Foster

P.G.

Hirt

R.P.

Harris

S.R.

Embley

M.T.

The archaebacterial origin of eukaryotes

Proc. Natl. Acad. Sci. U.S.A. 105 2008

20356–20361

Dettai and Lecointre, 2004

Dettai

Lecointre

In search of nothothenioid (Teleostei) relatives

Antarct. Sci. 16 2004

71–85

Douady et al., 2003

Douady

Delsuc

Boucher

Doolittle

Douzery

Comparison of Bayesian and maximum likelihood bootstrap measures of phylogenetic reliability

Mol. Biol. Evol. 20 2003

248–254

Felsenstein, 1981

Felsenstein

Evolutionary trees from DNA sequences: a maximum likelihood approach

J. Mol. Evol. 17 1981

368–376

Felsenstein, 1985

Felsenstein

Confidence limits on phylogenies: an approach using the bootstrap

Evolution 39 1985

783–791

Felsenstein, 2005

Felsenstein

PHYLIP. Phylogeny Inference Package, Version 3.6 2005

Department of Genome Sciences and Departement of Biology, University of Washington

Seattle

Foster, 2004

Foster

P.G.

Modeling compositional heterogeneity

Syst. Biol. 53 2004

485–495

Goremykin et al., 2010

Goremykin

V.V.

Nikiforova

S.V.

Bininda-Edmonds

O.R.P.

Automated removal of noisy data in phylogenomic analyses

J. Mol. Evol. 71 2010

319–331

Hassanin et al., 2005

Hassanin

Léger

Deutsch

Evidence for multiple reversals of asymmetric mutational constraints during the evolution of the mitochondrial genome of Metazoa, and consequences for phylogenetic inferences

Syst. Biol. 54 2005

277–298

Inagaki et al., 2004

Inagaki

Simpson

A.G.B.

Dacks

J.B.

Roger

A.J.

Phylogenetic artifacts can be caused by leucine, serine and arginine codon usage heterogeneity: Dinoflagellate plastid origins as a case study

Syst. Biol. 53 2004

582–593

Lakner et al., 2008

Lakner

van der Mark

Huelsenbeck

J.P.

Larget

Ronquist

Efficiency of Markov chain Monte Carlo tree proposals in Bayesian phylogenetics

Syst. Biol. 57 2008

86–103

Lartillot and Philippe, 2004

Lartillot

Philippe

A Bayesian mixture model for across-site heterogeneities in the amino-acid replacement process

Mol. Biol. Evol. 21 2004

1095–1109

Leigh et al., 2008

Leigh

J.W.

Susko

Baumgartner

Roger

A.J.

Testing congruence in phylogenomic analysis

Syst. Biol. 57 2008

104–115

Li and Lecointre, 2009

Lecointre

Formalizing reliability in the taxonomic congruence approach

Zool. Scr. 38 2009

101–112

Li et al., in preparation

Li, B., Sollari Lopes, J., Foster, P.G., Embley, T.M., Cox, C.J. The origin of plastids: a case study of phylogenetic conflict among protein-coding genes and their proteins resulting from compositional biases at synonymous substitution sites (in preparation).

Lockhart et al., 1994

Lockhart

P.J.

Steel

M.A.

Hendy

M.D.

Penny

Recovering evolutionary trees under a more realistic model of sequence evolution

Mol. Biol. Evol. 11 1994

605–612

Miyamoto and Fitch, 1995

Miyamoto

Fitch

Testing species phylogenies and phylogenetic methods with congruence

Syst. Biol. 44 1995

64–75

Miyamoto et al., 1994

Miyamoto

M.M.

Allard

M.W.

Adkins

R.M.

Janecek

L.L.

Honeycutt

R.L.

A congruence test of reliability using linked mitochondrial DNA sequences

Syst. Biol. 43 1994

236–249

Nabholz et al., 2011

Nabholz

Künstner

Wang

Jarvis

E.D.

Ellegren

Dynamic evolution of base composition: causes and consequences in avian phylogenomics

Mol. Biol. Evol. 28 2011

2197–2210

Posada, 2008

Posada

jModelTest: phylogenetic model averaging

Mol. Biol. Evol 25 2008

1253–1256

Robinson and Foulds, 1981

Robinson

D.F.

Foulds

L.R.

Comparison of phylogenetic trees

Math. Biosci. 53 1981

131–147

Rota-Stabelli et al., 2013

Rota-Stabelli

Lartillot

Philippe

Pisani

Serine codon-usage bias in deep phylogenomics: pancrustacean relationships as a case study

Syst. Biol. 62 2013

121–133

Roure and Philippe, 2011

Roure

Philippe

Site-specific time heterogeneity of the substitution process and its impact on phylogenetic inference

BMC Evol. Biol. 2011

Simmons, 2012

Simmons

M.P.

Misleading results of likelihood-based phylogenetic analyses in the presence of missing data

Cladistics 28 2012

208–222

Stamatakis, 2006

Stamatakis

RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models

Bioinformatics 22 2006

2688–2690

Yang, 1993

Yang

Maximum-likelihood estimation of phylogeny from DNA sequences when substitution rates differ over sites

Mol. Biol. Evol. 10 1993

1396–1401

Yang and Rannala, 1997

Yang

Rannala

Bayesian phylogenetic inference using DNA sequences: a Markov chain Monte Carlo method

Mol. Biol. Evol. 14 1997

717–724

Fig. 1

Robinson–Foulds symmetric difference distance (RF). A and B represent the sets of bipartitions defined by the branches of two trees. The grey area is the set of bipartitions that belong to either A or B but not to both. This set is the symmetric difference of A and B. The Robinson–Foulds distance between the two trees is the number of elements in this set.

Fig. 1.Distance de différence symétrique de Robinson–Foulds (RF). A et B représentent les ensembles de bipartitions définis par les branches de deux arbres. La zone grisée est l'ensemble des bipartitions qui appartiennent, soit à A, soit à B, mais pas aux deux. Cet ensemble est la différence symétrique de A et B. La distance de Robinson–Foulds entre les deux arbres est le nombre d'éléments dans cet ensemble.

Fig. 2

Coherence measures for the analytical strategies tested in this study. The vertical axis bears units of Robinson–Foulds topological distances (RF), that is, numbers of bipartitions that are not shared between a pair of trees. The lower the value, the more similar the trees are. The strategies are sorted according to the mean (blue circles) of the 6 RF distances between pairs of consensus trees (red stars). Those on the left are therefore the most coherent, and, presumably, the most accurate. The distribution of distances between individual bootstrapped or MCMC-sampled trees is represented by its 95% range and its mean (in green dots). See the text for explanations about the names of the analytical strategies.

Fig. 2. Mesures de cohérence pour les stratégies d'analyse testées dans cette étude. L'axe vertical est gradué en unités de distance topologique de Robinson–Foulds (RF), c'est-à-dire en nombres de bipartitions qui ne sont pas partagées par une paire d'arbres. Plus la valeur est faible, plus les arbres sont similaires. Les stratégies sont ordonnées selon la moyenne (cercles bleus) des six distances de RF entre paires de consensus (étoiles rouges). Celles à gauche sont donc les plus cohérentes et sont présumées les plus exactes. La distribution des distances entre arbres individuels générés par bootstrap ou échantillonnés par MCMC est représentée par son intervalle à 95% et sa moyenne (en pointillés verts). Voir le texte pour des explications à propos des noms des stratégies d'analyse.

Fig. 3

Possible cases for the coherence of results. The symbolic trees on the left represent evolutionary histories. The targets on the right represent the space of results. The datasets in the middle are the product (black arrow) of an evolutionary history, and are used to infer a result (dot on a target whose centre represents the true relationships) according to a given analytical strategy (green or red arrow). The more compact a set of dots, the more coherent the results it represents. (a) The datasets are the products of a same evolutionary history and there is no systematic bias repeatedly affecting the reconstruction. A ‘good’ analytical strategy will generate results close to the true historical relationships. A ‘bad’ analytical strategy will produce more dispersed results. (b) The datasets are the products of different evolutionary histories. A ‘good’ analytical strategy will produce results close to the historical relationships having produced the dataset on which the inference is based. The results will therefore be situated in different areas of the result space. The distribution of the results obtained by a ‘good’ analytical strategy will be less easy to distinguish from the results obtained by a ‘bad’ analytical strategy than when the datasets are the products of a same history. (c) The datasets are the products of a same evolutionary history and the analytical strategy is sensitive to a systematic inference error. The results will tend to concentrate around a zone of the result space corresponding to an artefactual reconstruction. This artefactual reconstruction will therefore correspond to a coherent set of results, and may be mistaken for the true evolutionary history.

Fig. 3. Cas de figure possibles de cohérence des résultats. Les arbres symboliques à gauche représentent des histoires évolutives. Les cibles à droite représentent l'espace des résultats. Les jeux de données au milieu sont le produit (flèche noire) d'une histoire évolutive, et sont utilisés pour inférer un résultat (point sur une cible dont le centre représente les vraies relations de parenté) suivant une stratégie d'analyse donnée (flèche verte ou rouge). Plus un ensemble de points est compact, plus les résultats représentés sont cohérents. a : Les jeux de données sont le produit d'une même histoire évolutive et il n'y a pas de biais systématique affectant de manière répétée la reconstruction. Une « bonne » stratégie d'analyse générera des résultats proches des vraies relations de parenté. Une « mauvaise » stratégie d'analyse produira des résultats plus dispersés. b : Les jeux de données sont le produit de différentes histoires évolutives. Une « bonne » stratégie d'analyse produira des résultats proches de l'histoire évolutive ayant produit le jeu de données sur lequel l'inférence est basée. Les résultats seront donc situés dans des zones différentes de l'espace des résultats. La distribution des résultats obtenus par une « bonne » stratégie d'analyse sera plus difficile à distinguer des résultats obtenus par une « mauvaise » stratégie d'analyse que dans le cas où les jeux de données sont le produit d'une même histoire. c : Les jeux de données sont le produit d'une même histoire évolutive et la stratégie d'analyse est sensible à une erreur systématique d'inférence. Les résultats tendront à se concentrer autour d'une zone de l'espace des résultats correspondant à une reconstruction erronée. Cette reconstruction erronée correspondra donc à un ensemble cohérent de résultats, et pourra être considérée par erreur comme l'histoire évolutive réelle.